Add arm64 encodings for IF_SVE_CL_3A to IF_SVE_CU_3A #95514

a74nh · 2023-12-01T13:59:54Z

The next batch.

MOV is an alias for CPY. Due to the scripting there are two different opcodes even though they have the same encodings. I think that's fine. I change CPY to the preferred MOV in emit_R_R_R()

For some of these I needed INS_OPTS_SCALABLE_n_FROM_SIMD_(SCALAR/VECTOR). But, doing that increases insOpts to 7 bits, which breaks the hardcoded sizings in one of the structures. Instead, I repurposed INS_OPTS_SCALABLE_n_TO_SIMD_(SCALAR/VECTOR) to INS_OPTS_SCALABLE_n_WITH_SIMD_(SCALAR/VECTOR). I don't think we ever need to distinguish between the two.

Fixed two of the perf scores for other instructions.

The use of insOpts and emitAttr seems to have resolved into:

If there are no fixed sized (GP or NEON) registers in the instruction then emitAttr is EA_SCALABLE.
If there is a fixed sized vector register in the instruction, then emitAttr is the size of the elements (1,2,4,8 bytes) register (8 bytes).
If there is a fixed sized general or FP register in the instruction, then emitAttr is the size of the the register (4,8 bytes for general, 1,2,4,8 bytes for FP).
insOpts states the size of the scalable register elements.
insOpts states if there are additional register types in the instruction (but not the size of those additional types).

EDIT: Fixed vector size above.

ghost · 2023-12-01T14:00:04Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

The next batch.

MOV is an alias for CPY. Due to the scripting there are two different opcodes even though they have the same encodings. I think that's fine. I change CPY to the preferred MOV in emit_R_R_R()

For some of these I needed INS_OPTS_SCALABLE_n_FROM_SIMD_(SCALAR/VECTOR). But, doing that increases insOpts to 7 bits, which breaks the hardcoded sizings in one of the structures. Instead, I repurposed INS_OPTS_SCALABLE_n_TO_SIMD_(SCALAR/VECTOR) to INS_OPTS_SCALABLE_n_WITH_SIMD_(SCALAR/VECTOR). I don't think we ever need to distinguish between the two.

Fixed two of the perf scores for other instructions.

The use of insOpts and emitAttr seems to have resolved into:

If there are no fixed sized (GP or NEON) registers in the instruction then emitAttr is EA_SCALABLE.
If there is a fixed sized vector register in the instruction, then emitAttr is the size of the elements (1,2,4,8 bytes).
If there is a fixed sized general or FP register in the instruction, then emitAttr is the size of the the register (4,8 bytes for general, 1,2,4,8 bytes for FP).
insOpts states the size of the scalable register elements.
insOpts states if there are additional register types in the instruction (but not the size of those additional types).

Author:	a74nh
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `community-contribution`
Milestone:	-

a74nh · 2023-12-01T14:04:09Z

I assume WSP is named with as wsp? because it normally is not used. I renamed it to wsp to match capstone, but am happy to remove and change the test to use SP.

a74nh · 2023-12-01T15:11:22Z

@kunalspathak @BruceForstall

BruceForstall · 2023-12-02T00:14:23Z

I assume WSP is named with as wsp? because it normally is not used. I renamed it to wsp to match capstone, but am happy to remove and change the test to use SP.

Isn't it nonsensical to have a use of sp that isn't a full-sized 8-byte use? Wouldn't "wsp" indicate a 4-byte use? Does that ever happen?

BruceForstall · 2023-12-02T00:16:22Z

If there is a fixed sized vector register in the instruction, then emitAttr is the size of the elements (1,2,4,8 bytes).

Is this how it is today for non-SVE? (instead of using insOpts for the element size)

kunalspathak

LGTM

src/coreclr/jit/registerarm64.h

kunalspathak · 2023-12-02T03:26:25Z

btw, do you mind posting the capstone vs. coreclr disassembly as well?

a74nh · 2023-12-04T09:05:01Z

capstone output (identical to clr output):

and z0.b, p1/m, z0.b, z2.b
bic z3.h, p4/m, z3.h, z5.h
eor z14.s, p5/m, z14.s, z16.s
orr z29.d, p7/m, z29.d, z31.d
add z5.b, p6/m, z5.b, z7.b
sub z15.h, p7/m, z15.h, z29.h
subr z2.s, p0/m, z2.s, z13.s
sdiv z3.s, p2/m, z3.s, z9.s
sdivr z31.d, p3/m, z31.d, z29.d
udiv z1.s, p0/m, z1.s, z0.s
udivr z13.d, p7/m, z13.d, z15.d
smax z24.b, p0/m, z24.b, z2.b
smin z9.h, p1/m, z9.h, z27.h
sabd z5.b, p2/m, z5.b, z6.b
uabd z23.s, p3/m, z23.s, z9.s
umax z15.s, p4/m, z15.s, z2.s
umin z12.d, p7/m, z12.d, z0.d
mul z5.d, p1/m, z5.d, z3.d
smulh z17.s, p5/m, z17.s, z5.s
umulh z12.b, p2/m, z12.b, z24.b
asr z5.s, p0/m, z5.s, z21.s
asrr z1.b, p7/m, z1.b, z20.b
lsl z0.h, p2/m, z0.h, z0.h
lslr z27.d, p6/m, z27.d, z31.d
lsr z5.b, p5/m, z5.b, z6.b
lsrr z15.s, p4/m, z15.s, z17.s
asr z4.b, p3/m, z4.b, z24.d
lsl z19.h, p7/m, z19.h, z3.d
lsr z0.s, p0/m, z0.s, z0.d
clasta z31.b, p7, z31.b, z31.b
clastb z30.d, p6, z30.d, z30.d
clasta h12, p1, h12, z15.h
clastb s13, p2, s13, z16.s
clastb d14, p0, d14, z17.d
clasta w0, p0, w0, z0.b
clasta w1, p2, w1, z3.h
clastb w23, p5, w23, z12.s
clastb x3, p6, x3, z9.d
shadd z15.b, p0/m, z15.b, z10.b
shsub z16.h, p1/m, z16.h, z11.h
shsubr z17.s, p2/m, z17.s, z12.s
srhadd z18.d, p3/m, z18.d, z13.d
uhadd z19.b, p4/m, z19.b, z14.b
uhsub z20.h, p5/m, z20.h, z15.h
uhsubr z21.s, p6/m, z21.s, z16.s
urhadd z22.d, p7/m, z22.d, z17.d
addp z23.b, p6/m, z23.b, z18.b
smaxp z24.h, p5/m, z24.h, z19.h
sminp z25.s, p4/m, z25.s, z20.s
umaxp z26.d, p3/m, z26.d, z21.d
uminp z27.b, p2/m, z27.b, z22.b
sqadd z28.b, p1/m, z28.b, z23.b
sqsub z29.h, p0/m, z29.h, z24.h
sqsubr z30.h, p1/m, z30.h, z25.h
suqadd z31.b, p2/m, z31.b, z26.b
uqadd z0.s, p3/m, z0.s, z27.s
uqsub z1.d, p4/m, z1.d, z28.d
uqsubr z2.b, p5/m, z2.b, z29.b
usqadd z3.b, p6/m, z3.b, z30.b
sqrshl z4.b, p7/m, z4.b, z31.b
sqrshlr z5.h, p0/m, z5.h, z30.h
sqshl z6.s, p1/m, z6.s, z29.s
sqshlr z7.d, p2/m, z7.d, z28.d
srshl z8.b, p3/m, z8.b, z27.b
srshlr z9.h, p4/m, z9.h, z26.h
uqrshl z10.s, p5/m, z10.s, z25.s
uqrshlr z11.d, p6/m, z11.d, z24.d
uqshl z12.b, p7/m, z12.b, z23.b
uqshlr z13.h, p0/m, z13.h, z22.h
urshl z14.s, p1/m, z14.s, z21.s
urshlr z15.d, p2/m, z15.d, z20.d
faddp z16.h, p3/m, z16.h, z19.h
fmaxnmp z17.s, p4/m, z17.s, z18.s
fmaxp z18.d, p5/m, z18.d, z17.d
fminnmp z19.s, p6/m, z19.s, z16.s
fminp z20.h, p7/m, z20.h, z15.h
fadda h21, p6, h21, z14.h
fadda s22, p5, s22, z13.s
fadda d23, p4, d23, z12.d
fabd z24.h, p3/m, z24.h, z11.h
fadd z25.s, p2/m, z25.s, z10.s
fdiv z28.s, p0/m, z28.s, z7.s
fdivr z29.d, p1/m, z29.d, z6.d
fmax z30.h, p2/m, z30.h, z5.h
fmaxnm z31.s, p3/m, z31.s, z4.s
fmin z0.d, p4/m, z0.d, z3.d
fminnm z1.h, p5/m, z1.h, z2.h
fmul z2.s, p6/m, z2.s, z1.s
fmulx z3.d, p7/m, z3.d, z0.d
fscale z4.h, p6/m, z4.h, z31.h
fsub z5.s, p5/m, z5.s, z30.s
fsubr z6.d, p4/m, z6.d, z29.d
andv b0, p0, z0.b
eorv h1, p1, z1.h
orv s2, p2, z2.s
orv d3, p3, z3.d
saddv d1, p4, z2.b
saddv d2, p5, z3.h
uaddv d3, p6, z4.s
smaxv d15, p7, z4.d
sminv s16, p6, z14.s
umaxv h17, p5, z24.h
uminv b18, p4, z31.b
cls z31.b, p0/m, z0.b
clz z30.h, p1/m, z1.h
cnot z29.s, p2/m, z2.s
cnt z28.d, p3/m, z3.d
fabs z27.h, p4/m, z4.h
fneg z26.s, p5/m, z5.s
not z25.b, p6/m, z6.b
abs z24.b, p7/m, z7.b
neg z23.s, p0/m, z8.s
sxtb z22.h, p1/m, z9.h
sxtb z22.s, p1/m, z9.s
sxtb z22.d, p1/m, z9.d
sxth z21.s, p2/m, z10.s
sxth z21.d, p2/m, z10.d
sxtw z20.d, p3/m, z11.d
uxtb z19.h, p4/m, z12.h
uxtb z19.s, p4/m, z12.s
uxtb z19.d, p4/m, z12.d
uxth z18.s, p5/m, z13.s
uxth z18.d, p5/m, z13.d
uxtw z17.d, p6/m, z14.d
compact z16.s, p7, z13.s
compact z15.d, p0, z12.d
mov z14.b, p1/m, b11
mov z13.s, p2/m, s10
mov z12.h, p3/m, h9
mov z11.d, p4/m, d8
mov z10.d, p5/m, sp
mov z9.h, p6/m, w30
mov z8.s, p7/m, w29
mov z7.b, p0/m, w28
lasta b6, p1, z27.b
lasta h5, p2, z26.h
lastb s4, p3, z25.s
lastb d3, p4, z24.d
lasta w1, p5, z23.b
lasta w0, p6, z22.s
lastb w30, p7, z21.h
lastb fp, p0, z20.d
rbit z28.h, p1/m, z19.h
rbit z28.b, p1/m, z19.b
rbit z28.s, p1/m, z19.s
rbit z28.d, p1/m, z19.d
revb z27.h, p2/m, z18.h
revb z27.s, p2/m, z18.s
revb z27.d, p2/m, z18.d
revh z26.s, p3/m, z17.s
revh z26.d, p3/m, z17.d
revw z25.d, p4/m, z16.d
nop

a74nh · 2023-12-04T09:11:47Z

I assume WSP is named with as wsp? because it normally is not used. I renamed it to wsp to match capstone, but am happy to remove and change the test to use SP.

Isn't it nonsensical to have a use of sp that isn't a full-sized 8-byte use? Wouldn't "wsp" indicate a 4-byte use? Does that ever happen?

Yeah, agreed, we would never actually used it. Revered and switched test to 8byte.

a74nh · 2023-12-04T09:31:08Z

If there is a fixed sized vector register in the instruction, then emitAttr is the size of the elements (1,2,4,8 bytes).

Is this how it is today for non-SVE? (instead of using insOpts for the element size)

Ah, no non-SVE uses the vector size. Looking at my code again, everything with a SIMD vector is currently one of the unsupported instructions, which is why I hadn't seen any issues.

Fixed the tests and code so that is uses size of the vector.

a74nh · 2023-12-04T13:35:12Z

CI failures don't look connected. (This patch should have no effect on any existing tests)

kunalspathak · 2023-12-04T14:29:56Z

superpmi failures are unrelated and we are seeing it in other PRs as well.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 1, 2023

ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 1, 2023

Add arm64 encodings for IF_SVE_CL_3A to IF_SVE_CU_3A

b56c574

a74nh force-pushed the IF_SVE_CL_3A_github branch from a5f514c to b56c574 Compare December 1, 2023 15:05

a74nh marked this pull request as ready for review December 1, 2023 15:09

kunalspathak mentioned this pull request Dec 1, 2023

Arm64: Implement SVE encodings #94549

Closed

kunalspathak approved these changes Dec 2, 2023

View reviewed changes

src/coreclr/jit/registerarm64.h Outdated Show resolved Hide resolved

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Dec 2, 2023

Revert wsp printing

da59654

Fix up sizes for WITH_SIMD_VECTOR instructions

8d74718

kunalspathak merged commit 95738f3 into dotnet:main Dec 4, 2023
123 of 129 checks passed

a74nh deleted the IF_SVE_CL_3A_github branch December 4, 2023 15:59

github-actions bot locked and limited conversation to collaborators Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add arm64 encodings for IF_SVE_CL_3A to IF_SVE_CU_3A #95514

Add arm64 encodings for IF_SVE_CL_3A to IF_SVE_CU_3A #95514

a74nh commented Dec 1, 2023 •

edited

Loading

ghost commented Dec 1, 2023

a74nh commented Dec 1, 2023

a74nh commented Dec 1, 2023

BruceForstall commented Dec 2, 2023

BruceForstall commented Dec 2, 2023

kunalspathak left a comment

kunalspathak commented Dec 2, 2023

a74nh commented Dec 4, 2023

a74nh commented Dec 4, 2023

a74nh commented Dec 4, 2023

a74nh commented Dec 4, 2023

kunalspathak commented Dec 4, 2023

Add arm64 encodings for IF_SVE_CL_3A to IF_SVE_CU_3A #95514

Add arm64 encodings for IF_SVE_CL_3A to IF_SVE_CU_3A #95514

Conversation

a74nh commented Dec 1, 2023 • edited Loading

ghost commented Dec 1, 2023

a74nh commented Dec 1, 2023

a74nh commented Dec 1, 2023

BruceForstall commented Dec 2, 2023

BruceForstall commented Dec 2, 2023

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak commented Dec 2, 2023

a74nh commented Dec 4, 2023

a74nh commented Dec 4, 2023

a74nh commented Dec 4, 2023

a74nh commented Dec 4, 2023

kunalspathak commented Dec 4, 2023

a74nh commented Dec 1, 2023 •

edited

Loading